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Flexible goal-directed behavior requires a 
performance monitoring system to mon- 
itor behavioral consequences in order 
to detect the need for further adjust- 
ments and control. When a failure in 
performance is detected by the monitor- 
ing system, some signals are transmit- 
ted to the brain structures responsible for 
control implementation. Evidences sug- 
gest the anterior cingulate cortex (ACC) 
(Carter et al., 1998; Gehring and Knight, 
2000; MacDonald et al., 2000; Ferdinand 
et al., 2012) and the lateral prefrontal 
cortex (1PFC) (MacDonald et al., 2000; 
Ridderinkhof et al., 2004a,b) as the neural 
correlates of performance monitoring and 
control implementation systems, respec- 
tively. The interaction of these two systems 
appears to modulate some components 
of event-related brain potentials (ERPs) 
linked with performance monitoring such 
as the error-related negativity (ERN), the 
N200, and the feedback-related negativ- 
ity (FRN) (Gruendler et al, 2011). The 
ERN is an ERP component that begins 
close to the time of the erroneous response 
in speeded response time tasks and peaks 
about 100 ms later (Gehring et al, 1993). 
The N200 is another negative deflection in 
ERP that peaks between 200 and 400 ms 
after stimulus onset, prior to the response 
execution, on correct trials of cognitive 
control experiments (Olvet and Hajcak, 
2008). The FRN as one of the most studied 
components is a negative-going deflection 
observed 230-330 ms following outcome 
presentation (Miltner et al., 1997) in gam- 
bling and trial-and-error learning tasks 
(Holroyd et al., 2006). Source localization 



studies show the neural source of the 
FRN to be located most probably in the 
ACC (Miltner et al., 1997; Gehring and 
Willoughby, 2002; Bellebaum and Daum, 
2008;Hauser etal, 2014). 

The central question in the interaction 
of performance monitoring and control 
systems is how the brain determines the 
need to recruit the intervention of con- 
trol structures. The reinforcement learning 
(RL) account of performance monitoring 
and control is one of the influential theo- 
ries to the field (Holroyd and Coles, 2002; 
Holroyd et al., 2005). The theory is based 
on the physiological evidences that reveal 
the similarity of the phasic activity of 
the mesencephalic dopamine system and 
reward prediction errors (RPEs) in tem- 
poral difference models of learning (Suri, 
2002). The theory holds that the moni- 
tor is located in the basal ganglia, which 
produces RPE signals that indicate when 
events are better or worse than expected. 
These RPEs are used by the ACC to 
improve performance on the task at hand 
(Holroyd et al., 2005). According to the 
RL model, negative RPEs sent to the ACC 
generate the ERN and the FRN. Another 
prominent theory, the conflict-monitoring 
theory (CMT) proposes that the perfor- 
mance monitoring system monitors for 
the coactivation of mutually incompati- 
ble response tendencies or conflict during 
response selection. The CMT suggests that 
the ACC detects response-conflict signal 
and sends this information to the dorso- 
lateral prefrontal cortex for further adjust- 
ment and control (Botvinick et al., 2001; 
Yeung et al., 2004). Based on this theory, 



the N2 and the ERN can be described 
using conflict signal. The CMT argues that 
the N2 and the ERN are electrophysio- 
logically correlated with pre-response and 
post-response conflict signals, respectively. 
However, since no motor response exists 
after external feedback presentation, the 
CMT cannot account for the phenom- 
ena commencing after feedback onset, e.g., 
the FRN (Ullsperger et al, 2014). In our 
previous studies, we have explained the 
significance of integrating the computa- 
tional models associated with the RL and 
the CMT (Zendehrouh et al, 2013, 2014). 
Since the unification of these two theo- 
ries depends centrally on conflict signal 
definition, we propose a hypothetical cost- 
conflict monitor in the brain that extends 
the CMT theory to account for post feed- 
back activities in feedback-based learning 
tasks. Based on this proposal, the FRN can 
be described using a cost-conflict signal. 

The basis for our hypothetical cost- 
conflict monitor is that: (1) Theoretically, 
conflict can occur anywhere within the 
information processing system (Carter 
and van Veen, 2007). (2) Conflict-driven 
control is domain-specific suggested to 
be mediated by multiple, independent, 
and parallel-operating conflict monitor- 
controller loops in the brain (Egner, 2008). 
(3) The appraisal of costs and benefits 
associated with different candidate actions 
is a key aspect of decision-making. 

The Delay-based and the effort-based 
costs (effort needed to perform an action 
in order to obtain a reward) are two 
types of costs that bias decision mak- 
ing (Floresco et al., 2008). In delay-based 
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tasks, as the time passes, the subjective 
value of a reward is discounted hyperboli- 
cally (Green and Myerson, 2004). Also, the 
aversiveness of a negative event decreases 
hyperbolically with time (Murphy et al, 
2001). Evidences suggest that discount- 
ing can happen across many reward types, 
reward magnitudes, and several timescales 
even in the order of tens of millisec- 
onds (Haith et al., 2012). In this paper, 
it is hypothesized that in feedback-based 
learning tasks, the participants are faced 
with delay-based evaluations. Therefore, 
in these tasks, the time interval between 
response selection and feedback presen- 
tation gives rise to a cost. This delay 
elevates the cost of the rewarded out- 
come and reduces the cost of the non- 
rewarded outcome associated with the 
selected action. In fact, the conflict can 
be produced by simultaneous activation of 
the expected costs of possible outcomes 
that are mutually exclusive. Therefore, 
when a cost-conflict is detected by the 
monitoring system, the regulatory mech- 
anism implements the required control, 
e.g., by modifying the excitatory weights 
to the response units. The cost-conflict 
signal that may occur between expected 
costs can show the amount of subjec- 
tive transient uncertainty about what will 
happen that increases with time (delay) 
until receiving the actual outcome. The 
cost-conflict signal can also be viewed in 
the context of the emerging field of neu- 
roeconomics as an ambiguity signal that 
may be present during decision-making. 
Ambiguity is defined as a lack of con- 
fidence in probability assignment to the 
possible outcomes (Kishida et al, 2010). 
This is consistent with investigations sug- 
gesting the existence of an ambiguity- 
sensitive mechanism in the ventromedial 
prefrontal cortex (vmPFC) (Glimcher and 
Rustichini, 2004), and also with the role 
of this area in delay cost coding (Prevost 
et al, 2010; Rushworth et al., 201 1; Dreher, 
2013). 

This proposal can be validated by 
performing simple gambling games or 
probabilistic reinforcement learning tasks 
with feedback- timing manipulations at the 
timescale of milliseconds while measuring 
the brain responses with functional mag- 
netic resonance imaging (fMRI) and elec- 
troencephalography (EEG) to identify the 
contributions of the ACC and the vmPFC 



in those tasks. Especially, the behaviors 
of addicted and depressed individuals in 
these tasks that show anomalies in value 
based decision making (Sharp et al., 2012) 
can be beneficial. 

Therefore, the cost-conflict monitor as 
an independent and parallel loop to the 
response-conflict monitor detects the con- 
flict between the costs of likely outcomes 
of the selected action and uses this infor- 
mation to adjust the behavior for the 
future, thereby implements trial-by-trial 
adjustments. Surely, this proposal is spec- 
ulative and further experimental studies 
and research is needed to evaluate its 
merit. However, the proposal can provide 
promising avenues toward the unification 
of computational models associated with 
the RL and the CMT 
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